Predisposing deleterious variants in the cancer-associated human kinases in the global populations

Human kinases play essential and diverse roles in the cellular activities of maintaining homeostasis and growth. Genetic mutations in the genes encoding the kinases (or phosphotransferases) have been linked with various types of cancers. In this study, we cataloged mutations in 500 kinases genes in >65,000 individuals of global populations from the Human Genetic Diversity Project (HGDP) and ExAC databases, and assessed their potentially deleterious impact by using the in silico tools SIFT, Polyphen2, and CADD. The analysis highlighted 35 deleterious non-synonymous SNVs in the ExAC and 5 SNVs in the HGDP project. Notably, a higher number of deleterious mutations was observed in the Non-Finnish Europeans (26 SNVs), followed by the Africans (14 SNVs), East Asians (13 SNVs), and South Asians (12 SNVs). The gene set enrichment analysis highlighted NTRK1 and FGFR3 being most significantly enriched among the kinases. The gene expression analysis revealed over-expression of NTRK1 in liver cancer, whereas, FGFR3 was found over-expressed in lung, breast, and liver cancers compared to their expression in the respective normal tissues. Also, 13 potential drugs were identified that target the NTRK1 protein, whereas 6 potential drugs for the FGFR3 target were identified. Taken together, the study provides a framework for exploring the predisposing germline mutations in kinases to suggest the underlying pathogenic mechanisms in cancers. The potential drugs are also suggested for personalized cancer management.


Introduction
Among all the biological molecules, proteins are the key components involved in diverse cellular processes such as metabolism and signal transduction pathways.A specialized category of phosphotransferase proteins called "kinases" are essential for the cellular vital activities through a chemically reversible and swift phenomenon called protein phosphorylation [1][2][3][4].Protein phosphorylation is a post-translational modification managed by specific domains (e.g., the SH2 domain) activating/deactivating many enzymes and receptors by means of kinases and phosphatases, acting as molecular switches [5,6].Results from the human genome project (HGP) suggested the presence of at least 518 protein kinases in the human genome [7].These kinases are involved in phosphorylating at least 70% of the total human proteins [8,9].This indicates that each protein kinase and phosphatase involves roughly hundreds of phosphorylation events in complex enzymatic networks.
Several families of kinases, such as the rare but critical tyrosine kinases [10] and the major serine/threonine cycle-dependent kinases [11][12][13], aurora kinases [14], mTOR [15], and mitogen-activted protein kinases [16], have been reported so far.Despite their normal physiological role, kinase-mediated signaling is well known in various diseases such as cancer, and often causes the disease itself or drives its progression.To detect mutationally activated kinases in cancer signaling pathways, several large-scale cancer genome sequencing projects have been conducted such as the Cancer Genome Atlas (TCGA) [17].Findings of such projects have played a significant role in the development of inhibitors of oncogenic kinases for cancer therapy since the success of imatinib for Bcr-Abl-positive chronic myeloid leukemia (CML) patients and gastrointestinal stromal tumors (GIST) [18,19].Besides oncological issues, kinase dysregulation has also been revealed in other disorders, including metabolic (in particular diabetes), immune, neurological and infectious diseases [20][21][22][23].
Given the key importance of kinases in cellular signaling pathways involved in human cancers, identifying and profiling functional missense mutations is a rational approach to designing better diagnostic and therapeutic approaches.With recent advances in the highthroughput sequencing technologies, a large number of genetic variations have been identified.The identification of pathological genetic variants has proven crucial in gene level studies and formulating targeted therapies.However, efficient and effective identification of functionally significant genetic variations associated with the diseased status is a time consuming as well as an expensive task.For this purpose, various bioinformatics tools, designed on the basis of recent findings in protein structure and evolutionary biology may prove useful in predicting the functional importance of genetic mutations [24][25][26][27].Over the past few years, several in silico studies have been attempted to screen the alteration within the protein coding regions of genes, and have highlighted the efficiency of such bioinformatics tools to be efficient and effective platforms for indicating which role these mutations have in the pathology of diseases [28][29][30].
Non-synonymous single nucleotide polymorphisms (nsSNPs) are the simplest form of genetic variations occurring in the coding region and alter the encoded amino acid in the resultant protein [31].The nsSNPs-induced alteration in the encoded amino acid may further result in changing the physiochemical properties of native proteins and disrupting their molecular function [32,33].Several computational biology tools facilitate the screening of deleterious mutations by taking into account the sequence conservation among species, structural features, and physiochemical properties of proteins [34,35].Although there are a great many studies investigating the impact of pathogenic mutations in different cancers cohorts, yet the predisposing germline deleterious mutations for cancers in healthy populations have not been investigated so far.In this study, taking advantage of the publicly available genome-wide mutational database of global populations, such as the Exome Aggregation Consortium (ExAC) [36,37] and Human Genome Diversity Project (HGDP) [38], we used in an in silico approach to analyze highly deleterious mutated kinases throughout the human kinome relevant to carcinogenesis.The aim of our study is to identify the predisposing pathogenic kinase variants which are likely to be involved in cancer development in global populations.

Preparation of gene lists
The genes encoding kinases in humans were obtained from the protein kinase complement of the human genome [7].The HUGO Gene Nomenclature (HGNC) approved names and symbols of all the genes were confirmed through ENSEMBLE and GENCODE [39,40].The genes' coordinates (start and end positions on the chromosomes) were retrieved from the ENSEM-BLE release v94 using the hg38 build.

Database for variants extraction
The complete kinome genes list was passed through two databases of germline mutations i.e., Exome Aggregation Consortium (ExAC, n = 60,706) release 1.0, and Human Genome Diversity Project (HGDP, n = 1,050) [41,42] using the bcftools v.1.13.The bcftools 'view' module was used to subset the variants from the ExAC and HGDP data resulting in a vcf file containing variants in the kinome genes only.The subset variants were re-accessed for their association with cancer through the Catalogue of Somatic Mutation in Cancer (COSMIC) database v87 release [43].The COSMIC database resulted in two sets of kinase gene variants, specifically, variants in the coding and the non-coding regions.

Variants annotation and filtration
Variants annotated from the COSMIC database were subjected to Combined Annotation Dependent Depletion (CADD) scores for measuring the level of deleteriousness [44].CADD scores (C-score) threshold were chosen such that variants with C-scores <15 was considered least deleterious, C-scores 15-19.99 were considered deleterious, and variants with C scores �20 were considered highly deleterious [45].For reconfirmation of the deleterious variants, we further annotated these variants with ANNOVAR to determine the SIFT and PolyPhen [46,47] scores.Finally, the variants predicted as deleterious by the SIFT and PolyPhen2 tools and having CADD C-scores �20 were selected for further analysis, and the genes harboring such variants were termed as deleterious genes, as described in our previous study [48].

Gene set enrichment analysis
The underlying biological functions of the 500 kinome genes were evaluated through the PAN-THER version 14 [49].PANTHER is used as a comprehensive resource for the classification of genes according to their evolutionary history and their functions [50,51] based on Gene ontology (GO) [52,53].A list of highly deleterious genes was subjected to pathway analysis using the Reactome, a curated database [54,55], and KEGG database.The gene set file was analyzed in an automation mode so that all the genes were in the same line and delimited by commas.

Tissue specific expression of targeted genes in normal and diseased state
To study the functional effects of mutations linked with phenotype in various cancers, we performed a systemized analysis.Here, we selected the most significant genes to determine their expression in normal and diseased conditions, such as, breast ductal carcinoma, non-small cell lung cancer, prostate adenocarcinoma, hepatocellular carcinoma,, and pancreatic ductal carcinoma (top-rated cancer according to incidence and mortality globally) [56].For this purpose, we used GTEx portal (https://www.gtexportal.org/home/)which is a valuable public resource designed to investigate the regulation and expression of genes specific to particular tissues.The project has collected samples from 53 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including RNA-seq [57].We also used the Expression Atlas database (http://www.ebi.ac.uk/gxa/home) [58] which provides powerful methods to find information about genes and expression of protein in diverse species, in the respective biological conditions.The workflow of the study has been depicted in Fig 1.

Pharmacogenomics of the selected genes
The association of genetic variants with drug sensitivity and resistance mechanisms was also assessed by using the Pharmacogenomics software.To identify the drugs that target the prioritized genes, we first used Connectivity Map from Broad Institute (https://clue.io,data version 1.1.1.2,software version 1.1.1.39)[59].This resource provides curated datasets generated from testing well-annotated genetic and small molecular perturbations in a variety of cell lines.We exported the compounds and their mechanisms of action.To identify newer drugs and associations that may not be included in the Connectivity Map, we evaluated the Chong et al. ( 2017) study for drugs that target the prioritized genes in this study (http://clincancerres.aacrjournals.org/content/23/1/204.long).

Gene ontology
The entire human kinome consisted of 500 genes primarily involved in signal transduction pathways regulating the vital biological processes, such as, cell growth, survival, apoptosis, proliferation, differentiation, metabolic processes, anatomical structure formation, cellular compartment organization and genesis, developmental processes, and organismal processes (S1 Fig).

Kinome annotation, filtration and prioritization
All the SNVs in intronic, exonic, and flanking regions of the kinome genes under study, extracted from the HGDP and ExAC datasets, were analyzed for mutational load by applying our analysis pipeline (Table 1).The total number of single nucleotide substitution sites in the kinome genes was higher in the ExAC database (231,867) than in the HGDP database (9,358).However, there was higher number of per person SNV sites in the HGDP (8.91) than in the ExAC subset (3.82).For assessing the likely association of the subset SNVs with cancers, we further used COSMIC datasets to calculate the proportions of variants in the non-coding and coding regions, as well as the functional consequences such as synonymous, nonsynonymous, frameshift or non-frameshift in coding regions of both the datasets.Higher numbers of the annotated, non-coding, and coding variants was found in the ExAC database (4867, 952, 3915) as compared to the HGDP dataset (121, 61, 60).Notably, a higher non-synonymous/ synonymous ratio was observed in HGDP SNVs (4.57) vs the ExAC SNVs (2.59).No stop-gain or stop-loss mutation was found in the HGDP variants, whereas 99 stop-gain and 108 stoploss variants were observed in the ExAC.The number of frameshift variants was also found higher in ExAC datasets (233) than in the HGDP (15).A total of 39 and 1,405 variants were predicted to be deleterious by the SIFT and PolyPhen2 tools in the HGDP and ExAC datasets, out of which 6 and 36 variants were found as deleterious by the three in silico tools (SIFT, PolyPhen2, and CADD score�20) in the HGDP and ExAC datasets There was an overlapping of a few variants between three categories of cancer genes (Fig 2, S1 Table ).Population-wise, the Non-Finnish Europeans contained the highest number of deleterious variants (26 SNVs), followed by the Africans (14 SNVs), East Asians (13 SNVs), South Asians (12 SNVs), and Americans (9 SNVs).We highlighted the genomic locations of the genes harboring the deleterious variants associated with different types of cancers with the help of a phenogram (Fig 3).This analysis highlighted the loci of chromosomes 9 and 19 to be rich in deleterious variants involved in different types of cancers.

Functional consequences of deleterious genes by the enrichment analysis
To further understand the function and mechanism of the identified genes, functional and pathway enrichment analyses were performed using the PANTHER database.The GO term enrichment analysis showed that in the biological processes-associated category, the deleterious gene variants were enriched in protein phosphorylation, phosphate-containing compound metabolic process, phosphorus metabolic process, cellular protein modification process and protein modification process (S2 Table ).Moreover, for molecular function, the deleterious genes were enriched in protein kinase activity, phosphotransferase activity, alcohol group as acceptor, kinase activity, transferase activity, transferring phosphorus-containing groups, ATP binding, adenyl ribonucleotide binding, drug binding, and catalytic activity, acting on a protein (S3 Table ).Furthermore, the PANTHER pathway analysis also showed that deleterious gene variants were significantly enriched in cellular processes, such as, receptor complex, nucleus and nuclear body, transferase complex, transferring phosphorus-containing groups, cytosol, organelle, protein kinase complex, and nucleoplasm part (S4 Table ).

Comparison of NTRK1 and FGFR3 gene expression in normal and cancerous tissues
In the present study, the expression of NTRK1 in various cancers and their related normal tissues was performed through the Expression Atlas.The transcripts per million reads (TPM) level of NTRK1 was found to be lower in the lung, breast, prostate and pancreas cancers when compared to their normal state.However, an increased TPM of NTRK1 was found in liver cancer when compared to normal liver tissue (Fig 4).We evaluated the expression levels of FGFR3 in various fetal cancers and their related normal tissues through the Expression Atlas.
According to the analysis, a higher expression of FGFR3 was recorded in the lung, breast, and liver cancers, whereas its expression was significantly decreased in prostate and pancreas cancerous states compared with the respective normal tissues (Fig 4).The Expression Atlas analysis indicates the expression dependent activity of both NTRK1 and FGFR3 in various organs under physiologically normal and cancerous states.

Pharmacogenomics of NKTR1 and FGFR3
This analysis validated some of the drugs extracted from Connectivity Map and provided additional drugs that might be effective in the cancer treatment.We combined drugs from both Connectivity Map and Chong et al. study to generate a final list of drugs that target NTRK1 and FGFR3 (Tables 3 & 4).There were 13 drugs that target the NTRK1, and 6 drugs which target the FGFR3 protein.

Discussion
The predisposing pathogenic mutations in the humans enhance the susceptibility to the associated diseases.Here we prioritized potential pathogenic mutations in the global populations by using the publicly available database of human genetic variations (ExAC, and HGDP) by employing different in silico tools which predict the effect of genetic variations.We determined the deleterious mutations in the genes of kinase category already associated with different cancers.To the best of our knowledge this is the first report to present the predisposing potential deleterious mutations in the kinome genes.The presence of large number of predisposing detrimental variations can impact the overall fitness of a population [60,61].In this study, the number of deleterious variants in the kinome genes were found higher in non-Finnish Europeans, and Africans compared with other populations of the world.This finding correlates with the previous reports where a significant  percentage of uncommon variations in populations from Europe and Africa were deleterious.Also, there was a marginal difference in the number of rare frequency deleterious variants (derived allele frequency (DAF) < 1%) in the European-Americans and African-Americans [62].One possible explanation for the relatively higher percentage of high-frequency detrimental variations is due to the well-known bottleneck that the European populations went through, which may have caused harmful mutations to drift to high frequencies.Also, it has been shown in a recent study that an individual's deleterious genetic variations in cancer predisposing genes-sets are linked to higher risk, earlier onset age, elevated M1 macrophages in tumors, and higher tumor mutational burden in particular malignancies [63].
Through the genes set enrichment analysis, we prioritized two genes NTRK1 and FGFR3 having the most significant interactions.NTRK1 gene encodes for a protein known as tropomyosin receptor kinase A (TRKA) involved in various processes, especially in the development and survival of nerve cells.Its altered expression has been reported to result in pathogenesis such as cancers.Previously, a more aggressive subset of soft tissue tumors with distinctive LPF-like morphology were found associated with NTRK1-associated genetic abnormalities [64].In other reports, NTRK1 rearrangements in metastatic gastrointestinal cancer, colorectal cancer, glioblastoma, and non-small cell lung cancer patients were observed [65,66].This rearrangement might be associated with altered expression of the gene.Herein, the lower gene expression of NTRK1 in the cancer tissues might indicate suppression of its associated pathways leading to cancer.
The FGFR3 gene encodes a protein known as fibroblast growth factor receptor 3. Its expression also varies between normal and diseased conditions such as cancer.The FGFR3 seems to have diverse roles as its expression is upregulated in some cancers and downregulated in others.Its activation has previously been reported in bladder and skin cancers [67].In the bladder cancer, the FGFR3 is considered as an actionable target [68].In triple negative breast cancer, aberrant FGFR3 activation was identified through the mass spectrometry (MS)-based tyrosine phosphorylation profiling [69].Further, reduced sensitivity to tamoxifen (an estrogen blocker) was observed in vitro in the michigan cancer foundation-1 cells (MCF7) with FGF1.From the same study, the tissue microarray expression data of tamoxifen resistant breast tumors demonstrated that FGFR3 expression was significantly increased compared with the normal cells, resulting in activation of the mitogen-activated protein kinase (MAPK) and phosphoinositide 3-kinase (PI3K) signalling pathways, both of which have been implicated in tamoxifen resistance in breast cancer [70].These evidences suggest convincing role of FGFR3 in cancers predispositions and disease outcome.

Conclusion
The study, for the first time, estimates the predisposing mutations in the coding regions of the human kinases genes in the global populations by using the HGDP, and ExAC databases.The analysis highlighted that the Non-Finnish Europeans contained the highest number of deleterious SNVs (26), followed by Africans (14 SNVs), East Asians (13 SNVs), and South Asians (12 SNVs).The identification of deleterious mutations enabled exploration of the underlying mechanisms potentially involved in the onset of various sporadic cancers in the global populations.The gene set enrichment analysis highlighted the likely role of NTRK1 and FGFR3 proteins in sporadic carcinogenesis.The pathways identified in this analysis indicate the underlying molecular mechanisms which might be instrumental in managing the cancers without a known causative exposure.The identified drug targets may be used in personalized cancer management.The analysis based on the bioinformatics and in-silico tools is the limiting factor, and the findings can further be validated in the cell lines or animal models.In future, the identified variants can be screened in the patients to find the penetrance of variants in the populations.

Fig 3 .
Fig 3. Genomic positions of genes harboring the variants associated with various cancers as filtered from the ClinVar database.One circle beside the chromosomes denotes one gene, and the color indicates type of cancer.The loci on chromosomes 9 and 19 are richer in variants with clinical significance.https://doi.org/10.1371/journal.pone.0298747.g003

Fig 4 .
Fig 4. Graphical representation of TPM values of highly deleterious kinase genes (NTRK1 and FGFR3) involved in various cancers.The TPM values were retrieved from the Pan Cancer Analysis of Whole Genome Project compared with normal cells of the respective organs in the GTEx database.https://doi.org/10.1371/journal.pone.0298747.g004

Table 1 . The number of variants within the coordinates of human kinome genes set filtered from the two databases. Number of Kinome Genes 500 HGDP Database ExAC Database
https://doi.org/10.1371/journal.pone.0298747.t001